On the effectiveness of random walks for modeling epidemics on networks

Random walks on graphs are often used to analyse and predict epidemic spreads and to investigate possible control actions to mitigate them. In this study, we first show that models based on random walks with a single stochastic agent (such as Google’s popular PageRank) may provide a poor description of certain features of epidemic spread: most notably, spreading times. Then, we discuss another Markov chain based method that does reflect the correct mean infection times for the disease to spread between individuals in a network, and we determine a procedure that allows one to compute them efficiently via a sampling strategy. Finally, we present a novel centrality measure based on infection times, and we compare its node ranking properties with other centrality measures based on random walks. Our results are provided for a simple SI model for epidemic spreading.


Motivation
Markov chain models of dynamic processes on graphs have been exploited in a number of successful applications, of which the most notable one is perhaps Google's PageRank algorithm [1]. Other successful applications include the modeling of power grids [2], connections of neurons in neuroscience [3], social networks [4], road networks [5], and epidemic networks [6].
These models rely upon the idea that a system may be modeled as a random walk, where the system transitions from state to state in discrete time-steps (possibly itself again) with a given probability that only depends on the current node of the system. In the aforementioned examples, this could be an Internet 'surfer' who chooses a link from one web-page to pass to the next web-page [7], or a car that at each intersection chooses the next road segment [5], or in electric circuits the incoming current at each junction is split in the out-going branches. In such examples, known parameters of graph theory and Markov chains may be used to infer interesting properties of the modeled case study, such as (for instance) mean first passage times, or the second-largest eigenvalue modulus, or Kemeny's constant. For example, in [5], Kemeny's constant is used as a measure of how well-connected the road network is, and the change in Kemeny's constant upon removal of a road segment from the network is used as a measure of the importance of that road, thus using a well-understood Markov chain parameter associated with the network to make design and control decisions in the system. Of particular interest is the consideration of spreading phenomena in a network: for example, the spread of disease in a contact network, the dissemination of rumours in an online social network, and so on. The starting point of this manuscript is the observation that a model based on a single random walker may not be strictly appropriate in such applications, and that one should interpret the properties of the random walk on a given graph carefully, rather than naively equating the behaviour of the random walk with the behaviour of the dynamical system it models.
In the literature, two main categories of random-walk models have been proposed to analyse epidemic networks. First, models based on a single random walker have been used, often derived from the expanding field of centrality measures [8]; they have been shown to provide interesting insight of the population networks; see, e.g., [6,9,10]. These models are particularly attractive because the relevant quantities are simple to compute also for relatively large networks. However, as we shall show in Section 3, these models fail to accurately predict the number of steps required to infect individuals in a network, as they do not consider the simultaneous presence of infected individuals in the network.
A second class of random-walk models includes models where several random walks occur simultaneously in the network, mimicking the actual dynamics of viruses and agents [11][12][13]. This approach fits into the general theory of agent-based methods: many variants have been proposed in the literature, sometimes with high degrees of sophistication; see, e.g., [14][15][16][17][18]. However, the math can be worked out to obtain closed formulas only in cases with particular topology or symmetries [19,20]. In most cases, the relevant quantities can only be estimated through intensive Monte Carlo simulations, and can not be computed exactly using close formulas. Hence these models often suffer from scalability issues.

Contribution
In this manuscript we investigate random-walk based models for epidemic spreading and provide the following contributions to the state of the art: 1. First, we show through several examples why models based on a single random walker fail to accurately reproduce the dynamics of a virus in terms of infection times; 2. Second, we consider a classic 2 n -state Markov chain SI model on a network, and establish closed-form expressions for computing relevant infection times. Such times are based on matrices that grow exponentially with the size of the population. Thus, we also provide an algorithm based on a sampling strategy that retains the actual mean infection times and is more computationally efficient than Monte Carlo simulations; 3. Finally, we propose a centrality measure called the mean infection covering time to rank nodes in a network by their influence in disease spread. Also, we compare it with other conventional indicators on several examples.
The paper is organized as follows: in Section 2, we give the mathematical formulation of Markov chains and random walks on graphs, along with the associated indicators that are used for quantitative and qualitative insights into systems modelled by these. In Section 3, we provide several motivating examples in which we compare naïve interpretations of random-walk parameters with simulated outcomes in the context of disease spread, in order to highlight some of the ways that this model does not capture the true dynamics. In Section 4, we describe a classic Markov chain model for disease spread which does effectively capture the disease dynamics. This model results in a Markov chain with 2 n states, as the state space describes the state (susceptible or infected) of every individual in the network with a binary representation. Such models do not scale efficiently with the size of the network, so we also provide an algorithm based on a sampling strategy to estimate mean infection times from one individual to another one, or from one group of individuals to another. In Section 5, we use the same concept to calculate the time for a single individual to infect the entire network, and propose these mean infection covering times (MICTs) as a centrality measure, and discuss how they compare with other random walk indicators that have been used in the literature for control measures such as vaccination or testing. Finally, in Section 6 we conclude our manuscript and outline interesting future lines of research.

Graph theory
A graph G is a collection of vertices V, indexed v 1 , v 2 , . . ., v n , with a set of edges E consisting of pairs of vertices {v i , v j }. If {u, v} 2 E, the vertices u and v are said to be adjacent; we also write u * v, and say that v is a neighbour of u. The degree of a vertex u, denoted by deg(u), is the number of neighbours of u. The adjacency matrix of a graph is the matrix Throughout this article, we consider only graphs which are simple (no loops or multiple edges between vertices), undirected (there is no orientation associated with an edge), and connected (for any pair of vertices u and v, it is possible to reach u from v via a sequence of adjacent vertices). We occasionally use 'network' and 'graph' interchangeably, and also substitute 'vertex' for 'node'. The convention in the literature is that a graph refers to the abstract mathematical object, while a network is rooted in the real world in some way.

Markov chains
Suppose that we model a system as a stochastic process in which, at any given time, the system occupies one of a finite number of states s 1 , s 2 , . . ., s n , and transitions between these states in discrete time-steps with some fixed transition probabilities; that is, t i,j denotes the probability of occupying s j in the next time-step, given that the system is currently in s i . This may be represented as a sequence of random variables {X(t)jt = 0, 1, 2, . . .}, indexed by time-step and taking values from the state space {s 1 , s 2 , . . ., s n }. Implicit in the description above is the so-called Markov property: Note that P½A j B� denotes the conditional probability that event A occurs, given that event B occurs. The above re-states the assertion that the probability the system occupies a given state in the next time-step (at time t = k + 1) depends only on the current state of the system (at time t = k), and not on the state of the system in any previous step.
The probability transition matrix T = [t i,j ] is central in the analysis of the behaviour of the Markov chain. The (i, j) entry of T k gives the probability that the system occupies the state s j of the chain after exactly k time-steps, given that the initial state was s i . Given an initial probability distribution vector u = [u 1 u 2 � � � u n ] > in which u i is the probability of occupying s i initially, the vector u > T k gives the probability distribution across the state space at time k. Under certain conditions on the transition matrix T (primitivity), the Perron-Frobenius theorem indicates that as k ! 1, and independently of the initial distribution u, u > T k converges to the unique stationary distribution π > of the Markov chain, which may be calculated as the unique left eigenvector of T corresponding to the eigenvalue 1, normalized so that the entries sum to 1. Since u > T k represents the probability distribution at time k, this stationary distribution π represents the long-term probability distribution across the states; that is, π i represents the probability of the system occupying the state s i in the long run, or the proportion of time spent in s i .
To quantify the short-term behaviour of the system modelled by a Markov chain, we consider mean first passage times. For a Markov chain {X(t)jt = 0, 1, 2, . . .}, the first passage time from s i to s j is the random variable F i,j taking on the value of t for which X(t) = s j , given that X (0) = s i and X(k) 6 ¼ s j for all k = 0, 1, . . ., t − 1. The mean first passage time, then, is the expected value of F i,j , denoted m i,j . While this definition is probabilistic in nature, it can be shown that the mean first passage times can be computed using the transition matrix T: Here e i denotes the i th standard basis vector, 1 the all-ones vector, and T (j) denotes the principal submatrix of T obtained by deleting the j th row and column. Here m i,j is calculated as the row sum of (I − T (j) ) −1 corresponding to s i . We note that in this article we take the convention that m i,i = 0; alternatively we can define the mean first return time to s i , which can be shown to be equal to 1 p i . The matrix of mean first passage times is the matrix M = [m i,j ], and while each entry can be computed as above, it is also well-known that where Z is the so-called fundamental matrix of the Markov chain (see [21]), Z dg is the diagonal matrix whose entries consist of the diagonal entries of Z, and W is the diagonal matrix whose entries consist of the entries of π.
Given an irreducible Markov chain with transition matrix T, stationary vector π, and mean first passage matrix M = [m i,j ], one can define, for a fixed index i, the quantity This can be interpreted as the expected time to reach a randomly-chosen state j, starting from a fixed state i. Introduced in the 1960s in [21], this was remarkably shown to be independent of the choice of initial state i. As such, it is named Kemeny's constant, and is denoted as KðTÞ. Noting that p > 1 ¼ 1, it can be shown that admitting the interpretation of KðTÞ as the expected time of a random trip in the Markov chain, where the initial and terminal states of the trip are chosen at random, with respect to the stationary distribution.
A random walk on a given graph G = (V, E) is an example of a Markov chain. A random walker traverses the vertices of G, at each step choosing an adjacent vertex to move to uniformly at random. Thus the state space consists of the vertices v 1 , . . ., v n , and t i;j ¼ 1 degðv i Þ whenever {i, j} 2 E. Letting D be the diagonal matrix of vertex degrees, and A(G) the adjacency matrix of G, the probability transition matrix is given by T = D −1 A(G). Note that the stationary distribution vector is Kemeny's constant for the simple random walk on a graph G can be interpreted as a graph invariant indicating the 'connectedness' of the graph [5,22], or how fast information 'mixes' in the graph [23,24].

Absorbing Markov chains
A state s j of a Markov chain is called absorbing if t jj = 1; thus when the chain enters state s j , it remains there in every subsequent time-step. A Markov chain is called an absorbing Markov chain if its state space contains at least one absorbing state, and if for every s i which is not absorbing, there exists some absorbing state s j and some k > 0 such that t ðkÞ i;j > 0; that is, there is a nonzero probability that the Markov chain will reach one of the absorbing states in finite time, for any the initial state. While the analysis of irreducible Markov chains focuses on the limiting stationary distribution and mean first passage times, for absorbing Markov chains the limiting behaviour is always that the chain is eventually absorbed into one of the absorbing states, and one concerns oneself with the probabilities of absorption into the different absorbing states, and the expected time to absorption from some initial transient state.
Suppose the states are indexed so that the absorbing states are listed last in the ordering. This produces a block transition matrix as follows: ; where Q is a square matrix representing transitions between transient states, R represents transitions from transient states to absorbing states, and I is the identity matrix, whose order is determined by the number of absorbing states. The analysis of the behaviour of the chain before absorption occurs centres around the computation of the matrix (I − Q) −1 , the so-called fundamental matrix for absorbing chains (see [21]). Since the (i, j) entry of this matrix captures the expected number of visits to the j th transient state before absorption, given that the chain starts in the i th transient state. Thus the i th row of (I − Q) −1 R produces the probability distribution for the eventual absorbing state the chain ends up in, given that it starts in the i th transient state, and the expected time to absorption given that the chain starts in the i th transient state is computed as the i th row sum of (I − Q) −1 , or e > i ðI À QÞ À 1 1.

Remark 2.1.
Note that for an irreducible Markov chain, the expression for mean first passage times in (1) can be derived using these absorbing chain techniques described above, by designating the state s j as an absorbing state and replacing the j th row of T by zeros with a 1 in the j th position. Then the mean first passage times can be computed as the expected time to absorption, where the matrix Q is the principal submatrix T (j) . Furthermore, if one requires the expected time to reach a collection J of states, this can be determined by the appropriate row sum of ðI À T ðJ Þ Þ À 1 , where by T ðJ Þ we denote the principal submatrix of T with the rows and columns indexed by J removed. This is a key observation which we will use to calculate mean infection times in the 2 n -state model in Section 4. This result is well-known in the literature; see for example the discussion found in [25].

Epidemic modelling
In this article our focus is on exploring the ability of Markov chain models to reflect the dynamics of virus spreading in populations. For this purpose we shall only consider elementary SI epidemic models for the disease, where individuals may either be susceptible (S) or infected (I); and once an individual is infected, it remains infected indefinitely [26]. If two individuals are in contact and one is susceptible and the other infected, there is a fixed infection probability β that the susceptible individual also becomes infected. Throughout this work, when simulations are done we assume the infection probability is β = 0.1. Compartmental models of disease spread such as SI models traditionally make the assumption that the population is well-mixed, which means that every individual comes into contact with every other individual (making β the rate of transmission). In a network representation of the population, each individual is represented by a node, and an infected individual may infect another susceptible individual in the next time-step if the two of them are adjacent. This underlying graph is a contact network determining which individuals in the population have been in close contact that is sufficient to facilitate the spread of disease, and this network can be inferred from data in some way (see for example [27]).
Given a connected graph and at least one initially-infected individual, at some point all of the individuals in the network will be infected as well. That is, the equilibrium of the model is the state in which every individual is infected. The focus of our analysis of the above model of epidemics is on quantities dictating how fast the disease will spread. By considering quantities such as the time until the entire network is infected (i.e. until equilibrium is reached), or the time until a particular individual becomes infected, we can better measure and understand the role of an individual node in the dynamical process. We note that these concepts may apply naturally to other domains such as fake news spreading in online social networks, as similar epidemic models have been used in [28,29] regarding such applications.

Infection time predictions for single random walk models
In this section, we show that there is a discrepancy between the infection times predicted by random walk models with single walkers and the actual times observed in SI models. We first consider the graph in Fig 1 as a supporting example to clarify our discussion.
The probability transition matrix for the simple random walk on this graph is : In comparing this random walk with the movement of disease, there are some issues to note. If the (i, j) entry of T RW is naïvely interpreted as the probability of the disease spreading from individual i to individual j, then immediately we see that a contact of the same duration between individuals 1 and 4 produces two different probabilities t 14 and t 41 , due to the different degrees of the two nodes.
To overcome the problems of having contact of the same durations associated with different probabilities of spreading the virus, we can consider modifying the transition matrix to create

PLOS ONE
On the effectiveness of random walks for modeling epidemics on networks a symmetric one, which we denote T sym , adapting the example of [30]. The following definition can be used to account for the infection probability and constructs a symmetric probability transition matrix, so that the probability of i to infect j is the same as that of j to infect i. In fact, the matrix is defined so that every off-diagonal entry is equal. This matrix is necessarily doubly-stochastic (both the rows and columns sum to 1). We define T sym ¼ aAðGÞ þ ðI À aDÞ; for some 0 < a � 1 d max where d max is the maximum node degree, and D is the diagonal matrix of vertex degrees. Note that αA(G) is a substochastic matrix with zero diagonal, while I − αD is a diagonal matrix. One can interpret this definition as a transition matrix for a random walk on a graph where, at each vertex, the random walker has some fixed constant probability α of choosing any of the neighbours of the current vertex, but with the residual probability can choose instead to remain in place. For the graph in : Remark 3.1. We note that in this example, every off-diagonal entry is equal to the infection probability β = 0.1 that we wish to use in the disease model. In the case that the desired infection probability is larger than 1 d max , it is not possible to choose t ij = β for transition probabilities between individuals joined by an edge. However, we remark here that mean first passage times between distinct states for transition matrices of this form scale in a natural way with varying choices of α. That is, if Thus for the sake of proportional comparison of mean first passage times with the infection times obtained from simulations, the value of α is irrelevant.
We now intend to uncover an inaccuracy of the models based on random walks. For this purpose, we compute the mean first passage times m i,j for a random walk on the graph. To compare, we also extensively simulate the spreading of the virus, and compute from simulations the average time M i,j for individual j to be infected, given that we start with only individual i infected. These are the values which are compared in order to examine the efficacy of random-walk-based methods.
Let M RW and M sym be the matrices of MFPTs for the random walk and the symmetric random walk on the graph in Fig 1, respectively. Then, M RW and M sym can be computed using (1) or (2) to produce: : On the other hand, we can estimate the time for an initially-infected individual i to cause individual j to become infected using simulations. Assuming that the probability of spreading the virus is equal to 0.1, we ran 100000 Monte Carlo simulations with Algorithm 1.
We estimated a matrix M inf of these expected times regarding infection as follows: : ð4Þ Apart from scaling factors (one could normalize M RW or M sym for comparison purposes), it is still obvious that there is a disparity between the estimated values and the theoretical mean first passage times associated with the simple or symmetric random walks. Different predictions are obtained in general; as a single example, the ratio of m 1,3 to m 1,4 in M RW and the ratio of the corresponding entries in M sym are both close to 2, while the ratio of the corresponding entries in M inf is close to 1.
We consider another example which arises in the context of traffic flow in urban road networks, another domain of application in which random walks have been shown to be an effective model. Let L(m, n) denote an m × n lattice graph or grid graph; see Fig 2. We now consider a random walk on this lattice, under the assumption that the next state is chosen among the available neighbours with the same probability p = 0.1, while with the residual probability the state will not change (i.e., the random walker stays in place). For instance, this may correspond to a vehicle travelling in a lattice-like road network, where edges correspond to roads and nodes to intersections. At each intersection, the car will choose the next road with the same probability 0.1, and with the residual probability will remain in the same place. For example, for nodes in the middle of the lattice L (3,4) in Fig 2 which have 4 neighbours, this corresponds to saying that with probability 0.6 the state will not change in one step, and with probability 0.4 it will change state, with the same probability 0.1 of choosing any one of the four neighbours. In this case, it was shown in [5] that the mean first passage times (MFPTs) to travel from any state to any other state according to such a random walk, can be computed by using the transition matrix T given by T = 0.1A(G) + (I − 0.1D) where G is L (3,4). The values of such MFPTs are depicted on the left of Fig 3. We now consider the same graph, but we assume now that the graph depicts the social interactions between individuals, and we compute how long it would take for a virus to spread from individual i to individual j in reality. We assume again that the probability of spreading the virus in one step is β = 0.1, but in this context, one individual can infect more than one neighbour in a single step. Accordingly, the mean time to infection from one individual to any other individual are shown on the right of Fig 3. Mean infection times (MITs) from i to j are computed using Algorithm 1 under the assumption the i is the only infected individual at the first time-step, and it is straightforward to see that such times are much smaller than those that would have been estimated through a conventional random walk model.
Another example that displays clearly how MFPTs and MITs are qualitatively different is the following.
Example 3.2. Consider a complete graph on k nodes, and to one of these nodes (which we shall call A) attach a new node (which we shall call B). An example with k = 5 is depicted in  We shall show that the MIT from node A to B is constant, while the MFPT between the same two nodes grow quadratically with k.
Suppose that A is the only initially infected individual. Then, at each time-step B has a probability β of becoming infected, hence the MIT from A to B is the expected value of a geometric distribution with parameter β, that is, . This probability is independent of k. On the other hand, the mean time needed for a random walk starting from A to reach B increases as k grows: indeed, for large k, the random walk on the clique takes more time to return to A, since there are more nodes available to visit. More precisely, let M be the mean first passage time from A to B. Starting from A, the random walk either reaches B immediately in time 1, with probability 1 k , or moves to another node of the clique with probability kÀ 1 k . Then, at each subsequent time-step, it has probability 1 kÀ 1 to return to A, hence the time of first return to A from another node of the clique is E Geo 1 After the process has returned to A, by the Markov property we need again an average time M to reach B. This argument produces the equation M ¼ 1 k 1 þ kÀ 1 k ð1 þ ðk À 1Þ þ MÞ, which can be solved for M. Hence, M = k 2 − k+ 1.

Discussion
Even in very simple examples, it is obvious to see that modelling the spreading of the virus in a population as a random walk with a single agent leads to some coarse approximations in terms of the expected mean first passage times. This is rather important to note, as many researchers have used indicators based on random walks, and MFPTs explicitly, in a number of epidemic applications (e.g., vaccination or testing strategies). For instance, this comment holds for indicators such as Kemeny's constant [6], or the random walk centrality (RWC) [31]. Also, L(3, 4).

Fig 3. Comparison of mean first passage times for T and mean infection times in
https://doi.org/10.1371/journal.pone.0280277.g003 random walk betweenness (RWB), originally introduced by Newman [32] is a classic indicator based on random walks that is known to be one of the best indicators for which individuals should be vaccinated in a population [33].
All such indicators may still provide interesting insights in the dynamics of a virus based on the underlying structure of the contact network-for example, in [6] it is shown that Kemeny's constant can be used to detect bridges between disparate communities in contact networks. The effectiveness of these indicators lies in the fact that they are obviously based on the topological structure of a population and of its average contacts; still, one may wonder about the impact of decisions that could be made on more precise models of the disease dynamics, rather than simply the network structure which is revealed through random walk methods. While we do not come to a conclusive answer to that question in this article, we provide the tools by which one can appropriately model and measure the disease spread using a different Markov chain model, and we explore some examples for which random walk indicators disagree with those determined from the more intuitive model.

Exact Markov chain models and efficient algorithms to compute mean infection times
We suggest that the primary reason that a random walk seems to be ineffective as a model for disease spread in a network is that previously-infected individuals remain infectious throughout the process, continuing to infect their contacts and affecting the overall dynamics of the disease in future time-steps. Even multi-agent systems modelled with multiple random walkers will not accurately capture this aspect of the disease spread context. As such, we require a model which retains information about every individual's status at once. In this section, we describe such a model which is known in the literature, and which has been used in a variety of settings and in many forms. We will review some of these results, and our contribution to the literature is to then define and investigate an analogue of mean first passage times in this setting. We determine how to compute these exactly (the values which were simulated in Section 3), and discuss some computational issues, as well as determine an algorithm in Sections 4.2 and 4.3 to effectively approximate these in the case that it is computationally expensive to compute exactly. We then proceed in Section 5 to establish how these quantities can be developed to determine a centrality indicator for use in control measures in disease spread settings.

2 n -state model of epidemic dynamics
Let G = (V, E) be a graph, and suppose its vertices are labelled v 1 , v 2 , . . ., v n . Let X(t) = (X 1 (t), . . ., X n (t)) 2 {0, 1} n where 0; otherwise: Since we consider the SI model, we assume that the probability of recovery is 0; hence if X i (t 0 ) = 1, then X i (t) = 1 for all t > t 0 (models with nonzero recovery probabilities are used in [34][35][36]). Assume that in any contact between a susceptible individual and an infected individual, the probability of infection is some constant β. Then, {X(t)jt = 0, 1, . . .} is a discrete-time, time-homogeneous Markov chain with a finite state space S = {0, 1} n ; that is, each state is represented by a binary vector or n-tuple in which the kth entry is 1 if v k is infected, and 0 if v k is suspectible. For a subset A � V, we denote by s A the n-tuple in S whose k th component is 1 if v k 2 A, and zero if v k = 2 A. For simplicity, we write s {vi } as s i . In what follows, we assume that P½Xðt þ 1Þ ¼ s i j XðtÞ ¼ ð0; . . . ; 0Þ� ¼ 1 n for 1 � i � n, so that if initially no one is infected, an individual is chosen uniformly at random to be infected in the next step. Then, {X(t)} is an absorbing Markov chain with exactly one absorbing state s V = (1, . . ., 1).
Suppose that node v i is susceptible, and has n i infected neighbours at time t. Since the infections are independent, the probability that v i is not infected at time t + 1 is ð1 À bÞ n i . Hence Then, again by the independence of infections, for each x, y 2 S, the transition probability from x to y is given by We note that if y does not have at least the same infected population as x, then P½Xðt þ 1Þ ¼ y j XðtÞ ¼ x� ¼ 0. This means that the states can be ordered in such a way that the probability transition matrix T is an upper triangular matrix. We now provide a method to obtain the expected time for a group of some infected individuals to infect-directly or indirectly-another group of susceptible individuals. That is, given a subset of individuals A who are initially infected, we compute the expected time until a subset B of individuals are infected, where B � A. If we choose A = {v i } and B = {v i , v j }, this enables us to exactly compute the matrix M inf in (4). We define terminology for such expected times in order to distinguish from the mean first passage times used in the previous sections.  where e s A is the column vector whose entry corresponding to the state s A is 1 and zeros elsewhere; T (SB ) is the principal submatrix of T obtained by removing rows and columns corresponding to states in S B ; and 1 is the all-ones vector. This general expression can be used to calculate the entries of the MIT matrix for a given network, and can be used to calculate the mean infection covering times with the set A representing a single vertex, and B is the entire vertex set. The remainder of this article focuses on these two cases, but there are many uses for (7) in the context of disease spread and other spreading processes. For example, by setting S B to be the set of all binary vectors with more than p% ones (i.e. states in which more than p% of individuals are infected), and A to be some initially-infected set of interest, we can determine the time until some threshold of infections is reached (for some diseases, this is of interest for herd immunity).
Due to the increased size of the state space (from n states to 2 n states, where n is the size of the population), the calculation of these mean infection times becomes computationally expensive. In the remainder of this section, we discuss some computing strategies for the exact computation, then introduce and discuss approximation strategies in Sections 4.2 and 4.3, respectively.
We first note that we need not consider the entire matrix T S B ð Þ for finding μ A, B . In an SI epidemic model, for any state x such that x k = 0 for some k 2 A, the transition probability from s A to x is 0. That is, if S A is the set of states consisting of binary vectors with ones in the positions corresponding to A, and � S A ¼ S n S A is its set complement, then if the initial state is s A we may remove from consideration any states of the Markov chain from � S A without affecting the outcome of the computation. These correspond to states in which some individual who was initially infected is now not infected, which is impossible in the SI model. Hence, instead of T S B ð Þ in (7), we may use the matrix T ð � S A [S B Þ obtained from T by deleting rows and columns corresponding to the states in � S A [ S B . This substochastic matrix has rows and columns indexed by the states in S A . Therefore, Note that e s A , I and 1 are re-sized appropriately (their size determined in context), and that e s A is the standard basis vector with a 1 in the position corresponding to the state s A , according to where it appears in the reduced state list. Now let us consider a mean infection covering time μ {vi }, V. The size of the matrix T is exponential, so it is impractical to store data for T when the number of nodes in the corresponding graph is sufficiently large. Note that T is an upper triangular matrix, and computational experiments show that the sparsity of T is between 1% and 5% for Erdős-Rényi random graphs on 10 nodes with probability 0.5. Thus one could consider using a Sparse Triangular Matrix Solver (SpTrSV) (see [37] for a brief introduction and algorithms). In order to implement SpTrSV, it requires particular data formats [38] such as the compressed sparse row (CSR) or the compressed sparse column (CSC), which are two formats for storing nonzero entries of a sparse matrix into three row vectors. However, as the size of the matrix increases exponentially, the number of nonzero entries would increase exponentially since the graphs we consider are connected. Hence, SpTrSV would also eventually become infeasible for the computation of mean infection covering times as in (10).
In order to see how using the transition matrix T might not be practical, we consider a particular example of a tree, which is a minimally-connected graph (that is, it has the least number of edges necessary to be a connected graph). Let G be a star with vertex set {1, . . ., n}-that is, one vertex is of degree n − 1 and the others have degree 1. Suppose that vertex 1 is of degree 1; and vertex 2 is of degree n − 1. Assume that X(0) = (1, 0, . . ., 0) and consider the 2 n−1 × 2 n−1 submatrix of the transition matrix. It can be seen that there are 2 n−2 + 1 nonzero entries on the off-diagonal. In empirical settings, the storage of data like this becomes an issue. Even though G is one of the sparsest graphs, we may not be able to calculate explicitly our desired mean infection times for sufficiently large orders of G by using (10).

A sampling strategy for the 2 n -state model
In this subsection, we give an alternative construction of the 2 n -state model, which allows us to use a sampling method for more efficient approximation of the mean infection times, and can be used to prove a nontrivial symmetry property. This construction is a discrete-time analogue of the continuous version appearing in [39,Theorem II.2] which the authors describe as 'folklore in some circles'. Let us consider a single edge {v i , v j } 2 E in the network. Assuming that one of its two endpoints is infected and the other is not, the time τ i,j taken for the infection to spread across the edge follows a geometric distribution, i.e., P½t i;j ¼ k� ¼ ð1 À bÞ kÀ 1 b for each k = 1, 2, . . .. We call τ i,j the potential infection time over the edge i, j, because we are working under the assumption that the infection spreads over this edge of the network from one of its endpoints to the other.
In this new alternative construction, we first assign independently a value τ i,j to each edge {v i , v j } 2 E, sampling according to the geometric distribution described above. Once these values τ i,j have been chosen, we can reconstruct the dynamics of the infection starting from an initial infected setXð0Þ ¼ s A for some A � V: an infection happens across edge {v i , v j } after time τ i,j if one of the two endpoints is infected and the other is not. Hence the infection spreads from a vertex v k to another vertex v ℓ in time t equal to the graph distance dist(v k , v ℓ ) on the graph G, with weights (edge lengths) given by τ i,j .
The Markov process that describes the infected individuals at time t in this construction is thusXðtÞ ¼ s fv2V : distðA;vÞ�tg , where dist(A, v) = min w2A dist(w, v). An example of possible sample values of the potential infection times τ i,j and the resulting dynamic of the processXðtÞ is depicted in Fig 5. The following proposition shows that this alternative construction provides the same result. This result could be considered intuitive by some, but we provide a formal proof.
Theorem 4.4. The stochastic processes X(t) andXðtÞ (with the same initial state Xð0Þ ¼Xð0Þ ¼ s A ) are equidistributed.
Proof. We shall show that the transition probabilities fromXðtÞ toXðt þ 1Þ coincide with those in (6) and are independent of previous history. Let us condition on the stateXðtÞ ¼ x, and consider the probability thatXðt þ 1Þ ¼ y has its j th component equal to 0, i.e., that dist (A, v j ) > t + 1. If x j = 1, then dist(A, v j ) � t, hence it is impossible (probability 0) that this distance is larger than t + 1. Otherwise, let us call n j the number of edges that join v j with another node v i with x i = 1. For v j to stay un-infected, it must be the case that dist(A, v i )+ τ i,j > t + 1 for each such edge; and since we are assuming that x j = 0 it must already be the case that dist (A, v i ) + τ i,j > t. By the memoryless property of the geometric distribution, and τ i,j is independent from the potential infection times of all other edges by construction. Hence Hence we have proved the last two cases in (5); the first two cases follow by difference. The product formula (6) follows from the fact that each factor depends on different potential infection times τ i,j , which are independent.
The alternate construction leads immediately to an algorithm for more efficient Monte Carlo simulation of the system, which we describe in Algorithm 2. Some remarks are in order.
• The most expensive part of Algorithm 2 is the all-pairs shortest-path matrix computation, for a total cost of O(N|V||E|) using the algorithm in [40] where N is the number of simulations. We report in Section 4.3 on how the computation time compares with both the exact computation of (8) and the Monte Carlo simulations of Algorithm 1, and also discuss accuracy and precision.
• As the M (k) are repeated samplings of a random variable, the central limit theorem (see, e.g., [25,Theorem 4.4.1]) applies, hence the estimateM inf computed with this algorithm is affected by an error jðM inf Þ ij À ðM inf Þ ij j that scales as where σ ij is the standard deviation of the infection time from i to j. This standard deviation can in turn be estimated using the computed samplings, providing an estimate of the error.
• With the same strategy one can estimate not only the MICT μ, but any quantity that depends only on the infection times; for instance, the mean time after which a given percentage p% of the nodes becomes infected.
• The construction that leads to Algorithm 2 can be generalized easily to directed graphs, and to deal with edge-dependent infection probabilities β i,j (as long as they are independent from one another). In fact, the construction can also be generalized to any distribution of the potential infection times τ i,j , not necessarily a geometric one. This provides an avenue to incorporate a large number of extensions into this simple model, from superspreaders to SIR. For instance, a disease with a recovery time of T can be modeled by choosing a distribution such that P½t i;j > T� ¼ 0.
• From this alternate description, one can prove easily the following result, which is a discretetime analogue of part of [39, Theorem 2.3].

Corollary 4.5 On an undirected graph (possibly with edge-dependent infection probabilities
proof. The MIT matrix M inf is the mean over all possible samples t ðkÞ i;j of the sample MIT matrix M (k) , which is symmetric for each k.
We remark that in general for random walks on graphs, the mean first passage matrix is typically not symmetric (i.e. m i,j 6 ¼ m j, i ), further delineating the contrast between mean first passage times obtained from simple random walks, and the mean infection times developed here.

Accuracy and computational burden of the sampling strategy
For actual computation of mean infection times for a network, we have to deal with a transition matrix whose order is exponential with respect to the number of vertices of the network, as seen in (8). The computation is only feasible up to graphs of order n � 15, due to storage problems. So, it is necessary to turn to algorithms to estimate MITs. Hence, we compare the two Monte Carlo estimations for mean infection times, in terms of speed and accuracy. In order to see which algorithm is more accurate, we need to have the actual MITs, so we shall consider various graphs on at most 12 vertices. For each of the two algorithms, we report in Table 1 the mean μ and the variance σ 2 of relative errors for entries between the actual and estimated MIT matrices, running 300 simulations.
We now compare running times. The running time of Algorithm 1 mainly depends on the infection probability-that is, the smaller β is, the longer each simulation of the time taken for an individual j to become infected takes. In contrast, the most expensive part of Algorithm 2 is to compute the all-pairs shortest-path matrix, which does not depend on infection probability. For demonstration, we pick four contact networks [41], which are available at https:// networkrepository.com. When a network is disconnected, we choose its largest component. In Table 2, we measure the computation times of the MIT matrix, running 100 simulations/samplings with different infection probabilities 0.2, 0.1, and 0.05. Table 2 shows that the approach based on the sampling strategy in Algorithm 2 is very efficient and scales very well with the size of the population of individuals, thus paving the way for a Markovian analysis of epidemic networks that does not suffer from the curse of dimensionality.

Exploring the ranking of nodes according to mean infection covering times
Ranking nodes in a graph has many applications in epidemic models, in terms of control strategies such as testing, vaccinating, and so on (see [6,30,33]). In this section, we investigate the mean infection covering times as defined in 4.2 and how they can be used as a measure of how pivotal each node is in the network to the spread of disease. The mean infection covering time of v i is calculated using (8) and the observations following it as Then, m i indicates the expected time for all individuals to get infected, starting from an initially infected node v i . We note that the author of [12] investigated the mean infection time for the case that the infection probability is 1. Moreover, as an analogous notion of the mean infection times, one may consider the cover time of a Markov chain [42], which is the expected value of the first time at which all the states have been visited.

Comparison with other indicators on 2-community networks
Here we compare what nodes are regarded as 'most important' according to different ranking indicators. In many social networks, node degree is an important metric to determine popular nodes, i.e., individuals who typically meet a large number of other individuals during a day, or who have many social ties on online social platforms. At the same time, it is also known that in networks with strong community structures, immunization interventions targeted at individuals bridging communities (e.g., families, school classes, working environment) are more effective than those simply targeting highly-connected individuals [33]. The reason for this is that regardless of the number of contacts one has, an individual bridging communities may pass the virus from one community to another community, and cause the infection to spread to a fully-susceptible community.
In what follows, we shall consider four random-walk based indicators or centrality measures, namely, the mean infection covering times introduced in this section, Kemeny's constant, random walk betweenness (RWB), and random walk centrality (RWC). Here we provide formulae for RWB and RWC; see (3) for the description of Kemeny's constant.
The random walk betweenness of node i in a network, denoted b i , measures the 'expected net number of times a random walk passes through vertex i on its way from a source vertex s to a target vertex t, averaged over all s and t', which is given [32] by where I ðstÞ i is the current flow through vertex i starting from vertex s and ending at vertex t. This method for computing random walk betweenness depends on considering the graph as an electrical network in which each edge has a resistance of 1, and calculating the flow through each vertex (see [32] for details). This is then shown to be equivalent to the random walk interpretation given above.
The random walk centrality of node i in a network quantifies how easily or how quickly a random walker arrives at i from elsewhere in the network [31], which is given [43] by the reciprocal of the so-called accessibility index α i of node i: where π k is the stationary distribution for node k and m k, j is the mean first passage time from k to i. The accessibility index can be interpreted as the expected time for the random walker to arrive at node i, starting from a randomly-chosen node j, where this starting node is chosen with respect to the stationary distribution vector (i.e. with respect to the degree of the node). Since the random walk centrality of a node i is 1 a i , a large value for RWC indicates that the vertex in question is highly central, where 'centrality' is understood in terms of random walks terminating at a target vertex. Regarding Kemeny's constant, we score important nodes according to how their removal from the network affects the value of Kemeny's constant: c i ¼ KðG 0 Þ À KðGÞ where G 0 is the graph obtained from G by removing node i. That is, the higher the increment in Kemeny's constant after deletion of a node is, the more critical the node is (recall that large values of Kemeny's constant correspond to graphs which are not well-connected).
In the comparison that follows, we estimate MICTs by Algorithm 2 with 300 samplings. For RWB, RWC and Kemeny's constant, larger values of the indicator correspond to more important nodes, while the reverse occurs for MICTs (i.e., lower values of MICT correspond to more important nodes).
As a first example, consider the lattice graph L(6, 6). Fig 6 shows that all indicators agree that nodes placed in the center of the structure are the most important nodes.
We now consider networks formed by two communities as follows: given two lattice graphs L(n 1 , n 1 ) and L(n 2 , n 2 ), we let an edge e 1 join a vertex v 1 at a corner of L(n 1 , n 1 ) and a vertex v 2 at a corner of L(n 2 , n 2 ), and we also let e 2 join a vertex adjacent to v 1 in L(n 1 , n 1 ) and a vertex adjacent to v 2 in L(n 2 , n 2 ). We denote by G(n 1 , n 2 ) the resulting 2-community structure, and we shall refer to e 1 and e 2 as the community bridges. In particular, we now consider two types of 2-community structures: (1) two groups have the same size (i.e., G(7, 7)), and (2) one of two groups has a larger size than the other (i.e., G(10, 3)). Let us first consider the case of G(7, 7). As seen in Fig 7, all indicators show that the most critical nodes are positioned near the community bridges, and in particular, vertices v 1 and v 2 are the most critical ones. However, RWC indicates that the third and fourth ranked nodes are not incident to e 2 , while the other indicators do. Moreover, Kemeny's constant suggests that nodes farther away from the community bridges have similar criticality scores of the nodes near the bridges; while MICTs provide more diversified scores that change gradually (in terms of criticality) as nodes are farther away from the bridges.
We finally consider the case of G(10, 3), shown in Fig 8. The four cases now provide different indications about what should be classified as the most critical nodes. In particular, the positions of the most critical nodes according to Kemeny's constant, RWB, MICT, and RWC, respectively, vary in distance from the community bridges toward the central area of the larger community. This result appears to suggest that a criticality measure based on Kemeny's constant via node removal could be useful for detecting community bridges regardless of the sizes of the groups in a community structure (which is in accordance with [6]). Conversely, RWC appears to rank highly the nodes that lie in the central area of the relatively larger community. MICT and RWB provide intermediate indications that take into account both the sizes of the groups and the presence of bridges (where RWB gives more importance to the community bridges than MICT). Finally, the values of the scores from MICT are spread more evenly across their (linear) scale.

Comparison with other indicators on a contact network
In this subsection, we present a contact network 'ia-infect-dublin' with 410 nodes and 2765 edges (available at https://networkrepository.com), and see how the random walk-based indicators suggest critical nodes based on our understanding in the previous section. Instead of colouring nodes according to indicators' values, we display graphs with colouring by their own rankings. For MICT computations, we ran 300 simulations. Regarding computation times, it takes approximately 32.5, 1.5, 0.15, and 52.8 seconds for ranking nodes based on Kemeny's constant, MICT, RWC, and RWB, respectively. This suggests that node centrality for Kemeny's constant and RWB would be infeasible as the number of nodes in a network is larger.
As discussed in the previous section with 2-community structures, and indeed as seen in Fig 9, MICT and RWC are inclined to highlight important nodes in the central area rather https://doi.org/10.1371/journal.pone.0280277.g008 than nodes nearby community bridges, while Kemeny's constant and RWB are not. As a result, red-coloured nodes for MICT and RWC are more clustered in the central area while ones for Kemeny's constant and RWB scatter around community bridges.
We can understand their different behaviours between Kemeny's constant and RWB, and between MICT and RWC, considering communities in peripheral areas. Consider the community at the bottom left. Then, we can see that Kemeny's constant does not suggest nodes in the centre of the community as important ones while RWB does, for the reason that the Kemeny-based indicator is the most affected by community bridges as seen in Fig 8. Consider two communities at the bottom left and the top right. We can see that MICT ranks less important nodes in the communities than RWC does. Since RWC is the least affected by community bridges among the random walk-based indicators from Fig 9, one can interpret the difference in that MICT ranks more important nodes in the central area because they have more community bridges.
Finally, since one might expect that the result of rankings from MICT is relevant to node degrees, we consider ranking nodes according to node degrees. Its result is the most similar to the one from RWC.

Conclusions and future work
Random walks on graphs have been widely used in the literature to analyse and predict epidemic spreads, and to plan, design and compare the outcomes of different mitigation policies in terms of their ability to control the spreading of a virus. As we show in this manuscript, simplistic models based on single random walkers are very convenient in terms of computational aspects, and thus have a significant predictive power in networks of realistic size. However, they are not consistent with the actual dynamics of the virus, in terms of the number of steps which is actually required before the virus spreads from one individual to another individual in the network. Conversely, other models accurately depict the actual spreading of the virus, also in terms of expected steps for transmission, but grow exponentially with the number of individuals, and thus are not feasible for accurate predictions in large networks, unless special symmetries in the graph reduce the computational burden.
In this work we thoroughly review both the aforementioned models, and we describe a sampling strategy that allows one to adopt the more accurate model, in terms of expected steps for infection, to also analyse larger graphs without incurring in an exponential growth of the state space. In addition, we introduce a novel indicator of criticality of nodes, which is based on the expected number of steps for infection to reach the entire population. A comparison of the properties of this indicator with other similar importance indicators is also provided.
Our work can be extended in a number of directions. First of all, while all the results we have provided have been obtained from a simple SI model, where individuals are either infected or susceptible, the same approach can be easily generalized to more sophisticated, and realistic, epidemic models where also recovered, exposed, or individuals at other stages of the infection, are considered. In these cases, we are interested in ascertaining whether the proposed random-walk based model, which is now evaluable also for larger networks, does provide further insights in the dynamics of a virus. In particular, it would be interesting to study if there is a correlation between the identified critical nodes via MICTs, and actual superspreaders and superspreading events in real networks, which are known to having played an important role in the spreading of a virus [44][45][46]. Finally, computational aspects, convergence to the actual MICTs, and analytical tight bounds on the covering times are all aspects which still partially remain open questions, and we are interested in addressing in the future.